4 research outputs found

    Archiving Sound Recordings

    Get PDF
    Neki zvučni zapisi čine arhivsko gradivo, predstavljaju kulturno dobro i dio su nacionalne baštine te se kao takvi trebaju zaštititi i biti dostupni široj javnosti. Najbolji način za očuvanje uz osiguranje dostupnosti je njihovo arhiviranje. Ovaj članak bavi se važnim aspektima arhiviranja zvučnih zapisa. Najprije su pojašnjeni osnovni pojmovi, dane su uvodne natuknice, kratki povijesni pregled stvaranja zvučnih zapisa,nabrojane su i opisane vrste zvučnih zapisa prema nosaču na koji su pohranjeni, pojašnjena je uloga zvučnih zapisa kao informacija i kao arhivskog i kulturnog dobra. Zatim su u drugom poglavlju opisana načela i strategije za očuvanje zvučnih zapisa: koraci od kojih se sastoji postupak arhiviranja zvučnih zapisa, karakteristike medija za pohranu zvučnih zapisa s obzirom na njihovu nestabilnost, zastarijevanje i osjetljivost na vanjske utjecaje te najbolji uvjeti za njihovo očuvanje, ukratko je pojašnjen postupak digitalizacije zvučnih zapisa i karakteristika ciljnog formata zvučnih zapisa sa svrhom njihova arhiviranja. Slijedi poglavlje koji se bavi najpoznatijim standardima za opis zvučnih zapisa. Također je prikazan uzorak za opis zvučnih zapisa i primjer opisa zvučnog zapisa. U posljednjem poglavlju govori se o stanju u Hrvatskoj s aspekta stanja zvučnih zapisa i njihova arhiviranja.Some sound recordings are a part of the cultural and national heritage and as such they represent archival records. They should be protected but at the same time available to the public. Through the past, sound recordings have been created and played using a variety of equipment for recording and reproducing sound. Different segments of human activities that are of cultural, national and practical significance have been recorded. Therefore sound recordings are included among the archival records. They are kept in the youngest segments of the archives - sound archives. Storage media for sound recordings are susceptible to external influencesand to damage caused by improper use, inappropriate storage or equipment for reproduction. Therefore, in sound archives the best conditions should be ensured in order to avoid changes to the carriers and retain its original shape as long as possible. As the deterioration of the carriers over time is, however, inevitable, the only way to preserve sound recordings stored on them is by transferring files onto new media. In recent years, the problem of obsolescence of the carriers and equipment is increasingly being handled by digitizing sound recordings and by making those digitized copies available to the public while storing the recordings in their original form and format. The quality of digital recordings mostly depends on the methods of sampling and quantization of the analogue signal during its conversion to the digital format, and file format. In order to clarify the context and content of archival records and thus increase their availability, the archival records are described. For this purpose there are different norms and standards. First sound recordings in Croatia have been created as early as 1900. We can say that Croatia has a long tradition of the sound recording, but there is no institution that systematically takes care of them. Sound recordings that represent archival records are stored in multiple locations, either in institutions or private collections. In some institutions they are not available for the public use, and in private collections largely not available and unlisted. It is therefore of utmost importance that Croatian sound archive is founded. Although there were attempts to establish a central Croatian sound archive in the past, it still does not exist. If the situation does not change soon, many recordings could forever be lost or permanently destroyed

    Automatic prediction and modelling of Croatian prosodic features based on text

    Get PDF
    Ljudski govor prenosi široki raspon informacija sadržanih u naglasnom sustavu, intonaciji, trajanju, ritmu, stankama, govornoj brzini, a ta se obilježja često nazivaju zajedničkim imenom - prozodija. Za hrvatski jezik dosad nisu provedena opsežna istraživanja na temu predviđanja prozodijskih obilježja i njihova modeliranja. U ovoj se disertaciji istražila primjenjivost metoda predviđanja prozodijskih obilježja i njihova modeliranja na hrvatski jezik te mogućnosti njihova poboljšanja uz uključivanje lingvističkih obilježja i jezičnih specifičnosti karakterističnih za hrvatski jezik kao što je primjerice leksički naglasak. Hrvatski jezik pripada grupi ograničenih tonskih jezika u kojima tonska kontura realizirana na naglašenoj riječi nosi leksičku informaciju pa je zato preduvjet modeliranju prozodije hrvatskoga jezika postojanje rječnika koji obuhvaća naglaske kako osnovnih tako i izvedenih oblika riječi. U okviru ove disertacije se stoga izradio takav rječnik. Obzirom da rječnikom ne mogu biti obuhvaćene sve riječi koje se pojavljuju u tekstu, razvio se i sustav za automatsko dodjeljivanje naglasaka riječima koje se ne nalaze u rječniku. Sustav se zasniva na modelu koji se učio na podacima iz izrađenog naglasnog rječnika. U okviru doktorskog rada provedena je i analiza trajanja slogova hrvatskoga jezika te je izrađen model trajanja slogova. Tilt intonacijski model primijenjen je za modeliranje F0 konture, a u tu svrhu označen je korpus od 500 rečenica označen Tilt oznakama. Zbog brojnih uloga prozodije u ljudskoj komunikaciji, njezino predviđanje i modeliranje je važno i može se primijeniti u brojnim područjima obrade prirodnog jezika kao što su automatsko raspoznavanje govora, sinteza govora, automatska identifikacija govornika i jezika, određivanja granica pojedinih tema, određivanja emocionalnih stanja sudionika u komunikaciji, kod sustava za strojno potpomognuto prevođenje, sustava za računalno potpomognuto učenje jezika itd.Human speech conveys a wide range of information on the pitch accent, intonation, duration, rhythm, pauses, speech rate, and these characteristics are often collectively referred to as prosody. Because of the many roles of prosody in human communication, its predicting and modelling is important and can be applied in many areas of natural language processing such as automatic speech recognition, speech synthesis, automatic identification of speakers and languages, determining emotional states etc. Previous to this research no extensive research on the prediction of prosodic characteristics and their modelling had been conducted for the Croatian language. In this doctoral thesis the applicability of the methods for prosodic features predicting and their modelling was tested for Croatian. The possibility of improving their performance with the inclusion of linguistic features and linguistic specificities typical for the Croatian language (for example - lexical stress) was explored. The Croatian language is a pitch accent language in which the tone contour realized in the prominent words carries lexical information. Therefore a prerequisite for modelling the prosody of Croatian is the existence of the lexicon in which lexical stress of both basic and derived forms of words is marked. Such a lexicon was created by implementing the rules for constructing derived forms of words based on the addition of the appropriate extension and on the place of stress moving if necessary. The entries in the lexicon are comprised of all derived words written without and with its corresponding stress and morph syntactic description (MSD) or part-of-speech tag (POS). Croatian belongs to the group of under-resourced languages and it is therefore considered that the importance of the lexicon will be significant and that it will be greatly applicable in various fields of natural language processing. The lexicon is comprised of 72,366 words in their basic form and over 1.000,00 derived word forms. Besides the lexicon, the product of the implementation of the rules for constructing derived forms of words is a system for automatic stress assignment for Croatian. The accuracy of the system based on the rules is tested by comparing the results of its implementation to a text to the same text in which the stress to the words was assigned by an expert. The obtained results are very good with the accuracy of 78% if the MSD tags are assigned automatically to the words, and 87,7% if the MSD tags were corrected by hand. There are words in Croatian that are written independently, but when it comes to their stress, they do not have one, but are prosodically leaning to the next or previous word. Such words are called clitics (proclitics and enclitics). There are cases in Croatian when the stress from the word that usually bears stress moves to the proclitic. Those rules are also implemented in the system and their implementation increased the accuracy of the system to 92,8%. Sometimes words from the text cannot be found in the lexicon. For such cases, a system for automatic lexical stress assignment to the words was developed. The system consists of two models trained on the data from the above-described lexicon. One model was trained for the place of the stress prediction and the other for the category of the stress prediction (there are four possible stress categories in Croatian). The accuracy of the model for place of the stress prediction measured by tenfold cross-validation is 90,56%, and the accuracy of the model for category of the stress prediction is 86,02%. The accuracy of the models are also tested on the text which was used for the evaluation of the system based on the rules. The achieved accuracy for the place of the stress prediction is 97,4%, for the category of the stress 82,4%, and for both place and category of the stress the achieved accuracy is 80,1%. The system based on the rules achieved batter accuracy compared to the system for automatic stress assignment based on the models. However, because there were words that were not assigned the stress after the implementation of the system based on the rules, the system for automatic stress assignment based on the models was used as a supplement to the system based on the rules in such cases. Such a hybrid approach achieved the accuracy of 95,3%. In this doctoral thesis an analysis of syllable duration for Croatian was conducted and duration model developed. It was determined that the position of the syllable within word and sentence has impact to the duration of the syllable. In average, the duration of the syllable increased by 41,4% compared to the reference value if its position was at the beginning of the word and by 37,0% if its position was at the end of the word. If the position of the syllable was at the beginning of the sentence, its duration increased by 71,8% compared to the reference value, and by 104,75% if the syllable was in the end of the sentence. The analysis also showed that the contextual features have impact to the duration of the syllables. The duration of the syllable increased by different percentages according to the category of the consonants that followed after the observed syllable. There were three categories of features taken into consideration in the duration model that was developed for Croatian - positional, contextual and those related to the stress. First, the accuracy of the duration model was tested after taking into consideration all three categories of the features. Then the accuracy of the model was tested after leaving out one of the category in order to determine how each category of the features contributes to the accuracy of the duration model. It was determined that all three categories impact the accuracy of the model in certain percentage and the greatest impact have features that belong to the positional category. For intonation modelling of the Croatian language, Tilt intonation model was applied. For that purpose, a database of 500 sentences was labelled with corresponding tilt labels. The best RMSE value that was obtained by comparing the obtained F0 contour to the original is 22,2

    Uvid u automatsko izlučivanje metaforičkih kolokacija

    Get PDF
    Collocations have been the subject of much scientific research over the years. The focus of this research is on a subset of collocations, namely metaphorical collocations. In metaphorical collocations, a semantic shift has taken place in one of the components, i.e., one of the components takes on a transferred meaning. The main goal of this paper is to review the existing literature and provide a systematic overview of the existing research on collocation extraction, as well as the overview of existing methods, measures, and resources. The existing research is classified according to the approach (statistical, hybrid, and distributional semantics) and presented in three separate sections. The insights gained from existing research serve as a first step in exploring the possibility of developing a method for automatic extraction of metaphorical collocations. The methods, tools, and resources that may prove useful for future work are highlighted.Kolokacije su već dugi niz godina tema mnogih znanstvenih istraživanja. U fokusu ovoga istraživanja podskupina je kolokacija koju čine metaforičke kolokacije. Kod metaforičkih je kolokacija kod jedne od sastavnica došlo do semantičkoga pomaka, tj. jedna od sastavnica poprima preneseno značenje. Glavni su ciljevi ovoga rada istražiti postojeću literaturu te dati sustavan pregled postojećih istraživanja na temu izlučivanja kolokacija i postojećih metoda, mjera i resursa. Postojeća istraživanja opisana su i klasificirana prema različitim pristupima (statistički, hibridni i zasnovani na distribucijskoj semantici). Također su opisane različite asocijativne mjere i postojeći načini procjene rezultata automatskoga izlučivanja kolokacija. Metode, alati i resursi koji su korišteni u prethodnim istraživanjima, a mogli bi biti korisni za naš budući rad posebno su istaknuti. Stečeni uvidi u postojeća istraživanja čine prvi korak u razmatranju mogućnosti razvijanja postupka za automatsko izlučivanje metaforičkih kolokacija

    A general framework for detecting metaphorical collocations

    Get PDF
    This paper aims at identifying a specific set of collocations known under the term metaphorical collocations. In this type of collocations, a semantic shift has taken place in one of the components. Since the appropriate gold standard needs to be compiled prior to any serious endeavour to extract metaphorical collocations automatically, this paper first presents the steps taken to compile it, and then establishes appropriate evaluation framework. The process of compiling the gold standard is illustrated on one of the most frequent Croatian nouns, which resulted in the preliminary relation significance set. With the aim to investigate the possibility of facilitating the process, frequency, logDice, relation, and pretrained word embeddings are used as features in the classification task conducted on the logDice-based word sketch relation lists. Preliminary results are presented
    corecore